-
-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verification of dtypes of columns of X_row* is same that self.X #300
base: master
Are you sure you want to change the base?
Conversation
…t problems in the prediction step.
@@ -1791,3 +1796,25 @@ def get_xgboost_preds_df(xgbmodel, X_row, pos_label=1): | |||
0, "pred_proba" | |||
] | |||
return xgboost_preds_df | |||
|
|||
|
|||
def check_dtype_of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some tests for this? How flexible is it? (e.g. will it break over float32 vs float64? int vs float? etc)
@@ -50,6 +50,7 @@ | |||
|
|||
|
|||
from .explainer_methods import * | |||
from .explainer_methods import check_dtype_of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add check_dtype_of
to the __all__
at the start in explainer_methods.py then it is covered by the import *
(generally import * is frowned upon, but it's okay as long as you define a restrictive __all__
)
@@ -241,7 +242,9 @@ def __init__( | |||
col for col in self.regular_cols if not is_numeric_dtype(self.X[col]) | |||
] | |||
self.categorical_dict = { | |||
col: sorted(self.X[col].unique().tolist()) for col in self.categorical_cols | |||
col: sorted( | |||
v for v in self.X[col].unique().tolist() if not pd.isna(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an expert on lightgbm, but wouldn't there be usecases where na would be a category? Or is that handled differently? How about by catboost or other libraries?
df_target is not None and | ||
not df_target[features].dtypes.eq(df_origin[features].dtypes).all() | ||
): | ||
df_target[features] = df_target[features].astype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general not a fan of these functions that modify in place. Could you rewrite it such that it returns the transformed df instead? Then maybe call it adjust_dtypes_to_match_df(...)
or something?
Calling something check_dtype_of
when it actually modifies one of the arguments is confusing.
cool, thanks! tests are passing, but please have a look at my comments and see if you can add a few test cases for this new function... |
Hello, I will do the requested changes as soon as possible (the next week). Thanks |
Hello, I want to contribute by fixing two different bugs that are related to the usage of Ligthgbm.